Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix copying on supplementary plane characters #10530

Merged
merged 1 commit into from
Feb 10, 2019

Conversation

a4lg
Copy link
Contributor

@a4lg a4lg commented Feb 9, 2019

When I copy text from PDF displayed on pdf.js, only lower 16-bit of code points are copied (upper bits [that require surrogate pair] are ignored). See an example on issue #10529.

This is because PartialEvaluator.readToUnicode method used classic String.fromCharCode (which does not support out-of-UCS2 code points) instead of ES6's String.fromCodePoint (which does).

Merging this pull request replaces String.fromCharCode to String.fromCodePoint where necessary (two of four occurrences in evaluator.js).

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/868b3ff72df751a/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/fdf1a1b173182d7/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/868b3ff72df751a/output.txt

Total script time: 17.81 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2019

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/fdf1a1b173182d7/output.txt

Total script time: 25.50 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/fdf1a1b173182d7/reftest-analyzer.html#web=eq.log

@timvandermeij
Copy link
Contributor

The Windows failures do not look related to this patch.

@Snuffleupagus Could you review this? Also, would a "text" reference test catch this? If so, it should be added.

@Snuffleupagus
Copy link
Collaborator

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/e6e5c9fe25609c8/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Feb 9, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/e6e5c9fe25609c8/output.txt

Total script time: 1.70 mins

Published

@Snuffleupagus
Copy link
Collaborator

The Windows failures do not look related to this patch.

Firefox was updated on the bot, this is the "usual" sort of fallout from that.

Also, would a "text" reference test catch this? If so, it should be added.

Yes it will[1], and a test-case is a requirement for accepting the patch.


[1] Try adding #textLayer=visible to a viewer with/without the patch, with the pdfBugEnabled pref set, and compare the output.

@a4lg
Copy link
Contributor Author

a4lg commented Feb 10, 2019

OK, I'm working on the test.

@a4lg
Copy link
Contributor Author

a4lg commented Feb 10, 2019

Oh, making the test was harder than I expected (I could not figure out what "text" tests do). I just gave up.

Still, I can ... at least regenerate test PDF file only with free/open source fonts (Source Sans Pro and Hanazono Mincho): issue10529.pdf

@a4lg
Copy link
Contributor Author

a4lg commented Feb 10, 2019

I read the "Contributing" page again and added a testcase. I'm not sure this is okay though.

@a4lg
Copy link
Contributor Author

a4lg commented Feb 10, 2019

I ran gulp makeref and noticed that some glyphs are missing on Linux (because of insufficient out-of-UCS2 coverage). I think this is still OK since it (barely) shows missing codepoint value in hex and the image after the fix looks very different.

Here is the reference image before/after the fix on Ubuntu 18.10 + additional fonts (as described in "Bots") + Firefox 65.0.

Before the fix:
issue10529-bug-ref-1

After the fix:
issue10529-fixed-ref-1

I haven't tested it on Windows but I think it will not show missing glyphs because of preinstalled PMingLiU-ExtB font.

pdf.js had a problem when copying characters on supplementary planes
(0xPPXXXX where PP is nonzero).  This is because certain methods of
PartialEvaluator use classic String.fromCharCode instead of ES6's
String.fromCodePoint.

Despite the fact that readToUnicode method *tried* to parse out-of-UCS2
code points by parsing UTF-16BE, it was inadequate because
String.fromCharCode only supports UCS-2 range of Unicode.
@a4lg
Copy link
Contributor Author

a4lg commented Feb 10, 2019

Fixed code style and squashed.

@timvandermeij
Copy link
Contributor

/botio makeref

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/17a6f063e58269a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_makeref from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/5155dc9f2b52363/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/17a6f063e58269a/output.txt

Total script time: 16.14 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/5155dc9f2b52363/output.txt

Total script time: 22.69 mins

  • Lint: Passed
  • Make references: Passed
  • Check references: Passed

@timvandermeij timvandermeij merged commit dfe7d9b into mozilla:master Feb 10, 2019
@timvandermeij
Copy link
Contributor

Thank you for fixing this!

@fgilio
Copy link

fgilio commented Feb 13, 2019

Hi! I have a little question: Does this execute when copying text in the viewer or when building the text layer?

@timvandermeij
Copy link
Contributor

This runs when the text layer is built and therefore it impacts the way the text can be copied.

@fgilio
Copy link

fgilio commented Feb 13, 2019

Awesome, thanks for clarifying!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants